Intonation modelling for the synthesis of structured documents
نویسندگان
چکیده
This paper describes experiments concerning the prediction of a good intonation for the synthesis of structured documents. The paper extends our previous research in four important aspects: (i) models are trained and evaluated on read text material (no isolated sentences), (ii) the intonation model is evaluated while fully integrated in the entire prosody model chain, (iii) the feature selection process is completely automated, and (iv) the importance of typical text-level features such as text type, text structure and typesetting are investigated. Clearly, human readings of running texts exhibit a much richer intonation than the intonation observed in read isolated sentences. We try to capture this richness in an intonation model that can be learned automatically using data-driven techniques. Our intonation models are RNNs (Recurrent Neural Networks) which are trained from prosodically labelled databases. Objective tests have demonstrated that acceptable intonation models can be constructed in this way, and that text type and text structure are important features whereas type-setting is not.
منابع مشابه
Intonation Modelling for the Synthes
Human readings of structured documents exhibit a much richer intonation than that observed in read isolated sentences. It is a challenge to capture this richness in an automatic way using datadriven techniques. In this paper, we extend our previous research on intonation modelling for isolated sentences in different respects: (i) the RNN (Recurrent Neural Network) intonation model is now traine...
متن کاملModelling Japanese intonation using PENTAtrainer2
This paper presents results from Japanese intonation modelling using PENTAtrainer2, an articulatory synthesiser. Our first aim is to show that PENTA, on which PENTAtrainer2 is based, can achieve high accuracy in predictive synthesis of varying intonation contours. We trained the synthesiser on a 6251-sentence functionally annotated corpus and generated F0 contours for each communicative conditi...
متن کاملOptimized Selection of Intonation Dictionaries in Corpus Based Intonation Modelling
Data scarcity in corpus-based intonation modelling for TTS applications is addressed. We propose to apply a searching process to a list of dictionaries of classes of intonation patterns previously trained from corpus to avoid problems associated with the scarce number of samples in the classes. Results indicate that better results are obtained in comparison with previous alternatives where the ...
متن کاملF0 stylization and intonation modelling for Standard Yorùbá Text-to-speech application
This technical report documents experiment into stylization of the f0 curve on Standard Yorùbá (SY ) syllables as well as a technique for intonation modelling. A number of interpolation polynomials were evaluated using root mean square error and mean opinion score techniques. The stylisation experiment resulted in the selection of a 3 degree polynomial for modelling the f0 curves on Yorùbá syll...
متن کاملOptimized selection of intonation dictionaries in corpus based intonation modelling
Data scarcity in corpus-based intonation modelling for TTS applications is addressed. We propose to apply a searching process to a list of dictionaries of classes of intonation patterns previously trained from corpus to avoid problems associated with the scarce number of samples in the classes. Results indicate that better results are obtained in comparison with previous alternatives where the ...
متن کامل